imputation model
- North America > United States (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Hong Kong (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
Multi-environment Invariance Learning with Missing Data
Learning models that can handle distribution shifts is a key challenge in domain generalization. Invariance learning, an approach that focuses on identifying features invariant across environments, improves model generalization by capturing stable relationships, which may represent causal effects when the data distribution is encoded within a structural equation model (SEM) and satisfies modularity conditions. This has led to a growing body of work that builds on invariance learning, leveraging the inherent heterogeneity across environments to develop methods that provide causal explanations while enhancing robust prediction. However, in many practical scenarios, obtaining complete outcome data from each environment is challenging due to the high cost or complexity of data collection. This limitation in available data hinders the development of models that fully leverage environmental heterogeneity, making it crucial to address missing outcomes to improve both causal insights and robust prediction. In this work, we derive an estimator from the invariance objective under missing outcomes. We establish non-asymptotic guarantees on variable selection property and $\ell_2$ error convergence rates, which are influenced by the proportion of missing data and the quality of imputation models across environments. We evaluate the performance of the new estimator through extensive simulations and demonstrate its application using the UCI Bike Sharing dataset to predict the count of bike rentals. The results show that despite relying on a biased imputation model, the estimator is efficient and achieves lower prediction error, provided the bias is within a reasonable range.
- North America > United States > District of Columbia > Washington (0.04)
- Asia > Middle East > Jordan (0.04)
Partial Inverse Design of High-Performance Concrete Using Cooperative Neural Networks for Constraint-Aware Mix Generation
Nugraha, Agung, Im, Heungjun, Lee, Jihwan
High-performance concrete requires complex mix design decisions involving interdependent variables and practical constraints. While data-driven methods have improved predictive modeling for forward design in concrete engineering, inverse design remains limited, especially when some variables are fixed and only the remaining ones must be inferred. This study proposes a cooperative neural network framework for the partial inverse design of high-performance concrete. The framework integrates an imputation model with a surrogate strength predictor and learns through cooperative training. Once trained, it generates valid and performance-consistent mix designs in a single forward pass without retraining for different constraint scenarios. Compared with baseline models, including autoencoder models and Bayesian inference with Gaussian process surrogates, the proposed method achieves R-squared values of 0.87 to 0.92 and substantially reduces mean squared error by approximately 50% and 70%, respectively. The results show that the framework provides an accurate and computationally efficient foundation for constraint-aware, data-driven mix proportioning.
- Materials > Construction Materials (1.00)
- Construction & Engineering (1.00)
Masking criteria for selecting an imputation model
Yang, Yanjiao, Suen, Daniel, Chen, Yen-Chi
Missing data is a common problem across various scientific disciplines, including medical research (Bell et al., 2014), social sciences (Molenberghs et al., 2014), and astronomy (Ivezi c et al., 2020). To handle missing entries in the dataset, imputation (Grzesiak et al., 2025; Kim and Shao, 2021; Little and Rubin, 2019) is a popular approach that is widely accepted in practice. An imputation model generates plausible values for each missing entry, transforming an incomplete dataset into a complete one. The critical importance of this task has led to the development of a wide array of imputation models, grounded in various modeling assumptions. These range from traditional approaches like hot-deck imputation (Little and Rubin, 2019) to more sophisticated methods such as Multiple Imputation via Chained Equations (MICE; V an Buuren and Groothuis-Oudshoorn 2011), random forest imputation (Stekhoven and Bühlmann, 2012), techniques based on Markov assumptions on graphs (Y ang and Chen, 2025), and even generative adversarial networks (Y oon et al., 2018). Despite the proliferation of imputation models, the selection of an optimal imputation model for a given dataset remains a significant challenge, largely due to the unsupervised nature of the problem. Among the many proposed strategies for evaluating and selecting imputation models, masking has emerged as a particularly popular procedure (Gelman et al., 1998; Honaker et al., 2011; Leek et al., 2012; Qian et al., 2024; Troyanskaya et al., 2001; Wang et al., 2024). Masking involves intentionally creating missing values in observed entries to create a setting where imputation accuracy can be measured against a known ground truth. This approach has demonstrated remarkable success and power in other domains, notably in language modeling (Devlin et al., 2019; Y ang et al., 2019) and image recognition (Hondru et al., 2025; Vincent et al., 2010; Xie et al., 2022) and prediction-powered inference (Angelopoulos et al., 2023; Wang et al., 2020).
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.47)
Missing Data Multiple Imputation for Tabular Q-Learning in Online RL
Chasalow, Kyla, Wu, Skyler, Murphy, Susan
Missing data in online reinforcement learning (RL) poses challenges compared to missing data in standard tabular data or in offline policy learning. The need to impute and act at each time step means that imputation cannot be put off until enough data exist to produce stable imputation models. It also means future data collection and learning depend on previous imputations. This paper proposes fully online imputation ensembles. We find that maintaining multiple imputation pathways may help balance the need to capture uncertainty under missingness and the need for efficiency in online settings. We consider multiple approaches for incorporating these pathways into learning and action selection. Using a Grid World experiment with various types of missingness, we provide preliminary evidence that multiple imputation pathways may be a useful framework for constructing simple and efficient online missing data RL methods.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
Markov Missing Graph: A Graphical Approach for Missing Data Imputation
We introduce the Markov missing graph (MMG), a novel framework that imputes missing data based on undirected graphs. MMG leverages conditional independence relationships to locally decompose the imputation model. To establish the identification, we introduce the Principle of Available Information (PAI), which guides the use of all relevant observed data. We then propose a flexible statistical learning paradigm, MMG Imputation Risk Minimization under PAI, that frames the imputation task as an empirical risk minimization problem. This framework is adaptable to various modeling choices. We develop theories of MMG, including the connection between MMG and Little's complete-case missing value assumption, recovery under missing completely at random, efficiency theory, and graph-related properties. We show the validity of our method with simulation studies and illustrate its application with a real-world Alzheimer's data set.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
DIM-SUM: Dynamic IMputation for Smart Utility Management
Hildebrant, Ryan, Bhope, Rahul, Mehrotra, Sharad, Tull, Christopher, Venkatasubramanian, Nalini
Time series imputation models have traditionally been developed using complete datasets with artificial masking patterns to simulate missing values. However, in real-world infrastructure monitoring, practitioners often encounter datasets where large amounts of data are missing and follow complex, heterogeneous patterns. We introduce DIM-SUM, a preprocessing framework for training robust imputation models that bridges the gap between artificially masked training data and real missing patterns. DIM-SUM combines pattern clustering and adaptive masking strategies with theoretical learning guarantees to handle diverse missing patterns actually observed in the data. Through extensive experiments on over 2 billion readings from California water districts, electricity datasets, and benchmarks, we demonstrate that DIM-SUM outperforms traditional methods by reaching similar accuracy with lower processing time and significantly less training data. When compared against a large pre-trained model, DIM-SUM averages 2x higher accuracy with significantly less inference time.
- North America > United States > California > Orange County > Irvine (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Energy (1.00)
- Government > Regional Government (0.46)
- Water & Waste Management > Water Management > Water Supplies & Services (0.34)